Search CORE

11 research outputs found

Reliability-oriented resource management for High-Performance Computing

Author: Agosta Giovanni
Campi Alessandro
Ciesielski Sebastian
Fornaciari William
Kulczewski Michal
Massari Giuseppe
Peta Miriam
Piatek Wojciech
Reghenzani Federico
Terraneo Federico
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Reliability is an increasingly pressing issue for High-Performance Computing systems, as failures are a threat to large-scale applications, for which an even single run may incur significant energy and billing costs. Currently, application developers need to address reliability explicitly, by integrating application-specific checkpoint/restore mechanisms. However, the application alone cannot exploit system knowledge, which is not the case for system-wide resource management systems. In this paper, we propose a reliability-oriented policy that can increase significantly component reliability by combining checkpoint/restore mechanisms exploitation and proactive resource management policies

Archivio istituzionale della ricerca - Politecnico di Milano

Energy-efficient SCalable Algorithms for weather Prediction at Exascale

Author: Baldauf Michael
Bauer Peter
Benard Pierre
Fuhrer Oliver
HansenSass Bent
Kulczewski Michal
McKinstry Alastair
Messmer Peter
New Nick
Szmelter Joanna
Termonia Piet
Vigouroux Xavier
Wedi Nils
Publication venue: 'Science Impact, Ltd.'
Publication date: 01/01/2017
Field of study

Ghent University Academic Bibliography

Tutorial applications for Verification, Validation and Uncertainty Quantification using VECMA toolkit

Author: Arabnejad H. (Hamid)
Coster D.P. (David)
Coveney P.V. (Peter)
Crommelin D.T. (Daan)
Edeling W.N. (Wouter)
Groen D. (Derek)
Hoekstra A.G. (Alfons)
Jancauskas V. (Vytautas)
Krzhizhanovskaya V.V.
Kulczewski M. (Michal)
Lakhlili J. (Jalal)
Luk O.O. (Onnie)
Suleimenova D. (Diana)
Veen L. (Lourens)
Ye D.
Zun P. (Pavel)
Publication venue: 'Elsevier BV'
Publication date: 01/07/2021
Field of study

The VECMA toolkit enables automated Verification, Validation and Uncertainty Quantification (VVUQ) for complex applications that can be deployed on emerging exascale platforms and provides support for software applications for any domain of interest. The toolkit has four main components including EasyVVUQ for VVUQ workflows, FabSim3 for automation and tool integration, MUSCLE3 for coupling multiscale models and QCG tools to execute application workflows on high performance computing (HPC). A more recent addition to the VECMAtk is EasySurrogate for various types of surrogate methods. In this paper, we present five tutorials from different application domains that apply these VECMAtk components to perform uncertainty quantification analysis, use surrogate models, couple multiscale models and execute sensitivity analysis on HPC. This paper aims to provide hands-on experience for practitioners aiming to test and contrast with their own applications

CWI's Institutional Repository

TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale

Author: Agosta Giovanni
Aldinucci Marco
Alvarez Carlos
Ammendola Roberto
Arfat Yasir
Beaumont Olivier
Bernaschi Massimo
Biagioni Andrea
Boccali Tommaso
Bramas Bérenger
Brandolese Carlo
Cantalupo Barbara
Cattaneo Daniele
Celino Massimo
Colonnelli Iacopo
Cretaro Paolo
d'Ambra Pasqua
Danelutto Marco
Esposito Roberto
Eyraud-Dubois Lionel
Filgueras Antonio
Fornaciari William
Frezza Ottorino
Galimberti Andrea
Giacomini Francesco
Goglin Brice
Guermouche Abdou
Iannone Francesco
Kulczewski Michal
Lo Cicero Francesca
Lonardo Alessandro
Martinelli Alberto,
Martorell Xavier
Massari Giuseppe
Mittone Gianluca
Montangero Simone
Namyst Raymond
Oleksiak Ariel
Palazzari Paolo
Reghenzani Federico
Saporana Sergio
Simula Francesca
Stanislao Paolucci Pier
Terraneo Federico
Thibault Samuel
Torquati Massimo
Turisini Matteo
Vicini Piero
Vidal Miquel
Zoni Davide
Zummo Giuseppe
Publication venue: HAL CCSD
Publication date: 01/09/2021
Field of study

International audienceTo achieve high performance and high energy efficiency on near-future exascale computing systems, three key technology gaps needs to be bridged. These gaps include: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetics; methods andtools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA aims at tackling this gap through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models and tools derived from European research

INRIA a CCSD electronic archive server

Methods to Load Balance a GCR Pressure Solver Using a Stencil Framework on Multi- and Many-Core Architectures

Author: Krzysztof Kurowski
Michal Kulczewski
Milosz Ciznicki
Piotr Kopta
Publication venue: Hindawi Limited
Publication date: 01/01/2015
Field of study

The recent advent of novel multi- and many-core architectures forces application programmers to deal with hardware-specific implementation details and to be familiar with software optimisation techniques to benefit from new high-performance computing machines. Extra care must be taken for communication-intensive algorithms, which may be a bottleneck for forthcoming era of exascale computing. This paper aims to present a high-level stencil framework implemented for the EULerian or LAGrangian model (EULAG) that efficiently utilises multi- and many-cores architectures. Only an efficient usage of both many-core processors (CPUs) and graphics processing units (GPUs) with the flexible data decomposition method can lead to the maximum performance that scales the communication-intensive Generalized Conjugate Residual (GCR) elliptic solver with preconditioner

Directory of Open Access Journals

Challenges in deeply heterogeneous high performance systems

Author: Agosta Giovanni
Atienza David
Canal Ramon
Cilardo Alessandro
Flich Jose
Fornaciari William
Gavilá Rafael Tornero
Kulczewski Michal
Luz Carles Hernandez
Massari Giuseppe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II

Challenges in Deeply Heterogeneous High Performance Systems

Author: Agosta Giovanni
Atienza Alonso David
Canal Ramon
Cilardo Alessandro
Flich Cardo José
Fornaciari William
Hernandez Luz Carles
Kulczewski Michal
Massari Giuseppe
Tornero Gavilá Rafael
Zapater Sancho Marina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

RECIPE (REliable power and time-ConstraIntsaware Predictive management of heterogeneous Exascale systems) is a recently started project funded within the H2020 FETHPC programme, which is expressly targeted at exploring new High-Performance Computing (HPC) technologies. RECIPE aims at introducing a hierarchical runtime resource management infrastructure to optimize energy efficiency and minimize the occurrence of thermal hotspots, while enforcing the time constraints imposed by the applications and ensuring reliability for both time-critical and throughput-oriented computation that run on deeply heterogeneous acceleratorbased systems. This paper presents a detailed overview of RECIPE, identifying the fundamental challenges as well as the key innovations addressed by the project, which span run-time management, heterogeneous computing architectures, HPC memory/interconnection infrastructures, thermal modelling, reliability, programming models, and timing analysis. For each of these areas, the paper describes the relevant state of the art as well as the specific actions that the project will take to effectively address the identified technological challenge

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Archivio della ricerca - Università degli studi di Napoli Federico II

UPCommons. Portal del coneixement obert de la UPC

The RECIPE approach to challenges in deeply heterogeneous high performance systems

Author: Agosta Giovanni
Atienza David
Canal Ramon
Cilardo Alessandro
Flich Cardo José
Fornaciari William
Hernández Luz Carles
Kulczewski Michal
Massari Giuseppe
Tornero-Gavilá Rafael
Zapater Marina
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

[EN] RECIPE (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems) is a recently started project funded within the H2020 FETHPC programme, which is expressly targeted at exploring new High-Performance Computing (HPC) technologies. RECIPE aims at introducing a hierarchical runtime resource management infrastructure to optimize energy efficiency and minimize the occurrence of thermal hotspots, while enforcing the time constraints imposed by the applications and ensuring reliability for both time-critical and throughput-oriented computation that run on deeply heterogeneous accelerator-based systems. This paper presents a detailed overview of RECIPE, identifying the fundamental challenges as well as the key innovations addressed by the project. In particular, the need for predictive reliability approaches to maximizing hardware lifetime and guarantee application performance is identified as the key concern for RECIPE. We address it through hierarchical resource management of the heterogeneous architectural components of the system, driven by estimates of the application latency and hardware reliability obtained respectively through timing analysis and modeling thermal properties and mean-time-to-failure of subsystems. We show the impact of prediction accuracy on the overheads imposed by the checkpointing policy, as well as a possible application to a weather forecasting use case.The activities described in this article received funding from the European Union's Horizon 2020 research and innovation programme under the FETHPC grant agreement no. 801137 RECIPE: REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems.Agosta, G.; Fornaciari, W.; Atienza, D.; Canal, R.; Cilardo, A.; Flich Cardo, J.; Hernández Luz, C.... (2020). The RECIPE approach to challenges in deeply heterogeneous high performance systems. Microprocessors and Microsystems. 77:1-13. https://doi.org/10.1016/j.micpro.2020.103185S11377Flich, J., Agosta, G., Ampletzer, P., Alonso, D. A., Brandolese, C., Cappe, E., … Zoni, D. (2018). Exploring manycore architectures for next-generation HPC systems through the MANGO approach. Microprocessors and Microsystems, 61, 154-170. doi:10.1016/j.micpro.2018.05.011https://euroexa.eu.https://www.altera.com/products/sip/memory/stratix-10-mx/overview.html.http://www.mango-project.eu.https://www.infinibandta.org/infiniband-roadmap/.Reghenzani, F., Massari, G., & Fornaciari, W. (2018). chronovise: Measurement-Based Probabilistic Timing Analysis framework. Journal of Open Source Software, 3(28), 711. doi:10.21105/joss.00711Abella, J., Padilla, M., Castillo, J. D., & Cazorla, F. J. (2017). Measurement-Based Worst-Case Execution Time Estimation Using the Coefficient of Variation. ACM Transactions on Design Automation of Electronic Systems, 22(4), 1-29. doi:10.1145/3065924https://lanl.gov/projects/trinity/specifications.php.https://www.bsc.es/marenostrum/marenostrum/technical-information.https://www.olcf.ornl.gov/olcf-resources/compute-systems/titan/.Bellasi, P., Massari, G., & Fornaciari, W. (2015). Effective Runtime Resource Management Using Linux Control Groups with the BarbequeRTRM Framework. ACM Transactions on Embedded Computing Systems, 14(2), 1-17. doi:10.1145/2658990Egwutuoha, I. P., Levy, D., Selic, B., & Chen, S. (2013). A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. The Journal of Supercomputing, 65(3), 1302-1326. doi:10.1007/s11227-013-0884-0Lee, K., & Wong, S. S. (2017). Fault-Tolerant FPGA with Column-Based Redundancy and Power Gating Using RRAM. IEEE Transactions on Computers, 66(6), 946-956. doi:10.1109/tc.2016.2634533Cheatham, J. A., Emmert, J. M., & Baumgart, S. (2006). A survey of fault tolerant methodologies for FPGAs. ACM Transactions on Design Automation of Electronic Systems, 11(2), 501-533. doi:10.1145/1142155.1142167Parris, M. G., Sharma, C. A., & Demara, R. F. (2011). Progress in autonomous fault recovery of field programmable gate arrays. ACM Computing Surveys, 43(4), 1-30. doi:10.1145/1978802.1978810A. Iranfar, F. Terraneo, W.A. Simon, L. Dragic, I. Pilji, M. Zapater Sancho, W. Fornaciari, M. Kovac, D. Atienza Alonso, Thermal characterization of next-generation workloads on heterogeneous MPSoCs (2017).Zoni, D., & Fornaciari, W. (2015). Modeling DVFS and Power-Gating Actuators for Cycle-Accurate NoC-Based Simulators. ACM Journal on Emerging Technologies in Computing Systems, 12(3), 1-24. doi:10.1145/2751561Curtsinger, C., & Berger, E. D. (2013). STABILIZER. ACM SIGARCH Computer Architecture News, 41(1), 219-228. doi:10.1145/2490301.2451141Kormann, J., Rodríguez, J. E., Gutierrez, N., Ferrer, M., Rojas, O., de la Puente, J., … Cela, J. M. (2016). Toward an automatic full-wave inversion: Synthetic study cases. The Leading Edge, 35(12), 1047-1052. doi:10.1190/tle35121047.1Fusi, M., Mazzocchetti, F., Farres, A., Kosmidis, L., Canal, R., Cazorla, F. J., & Abella, J. (2020). On the Use of Probabilistic Worst-Case Execution Time Estimation for Parallel Applications in High Performance Systems. Mathematics, 8(3), 314. doi:10.3390/math8030314D.W. Wright, R.A. Richardson, W. Edeling, J. Lakhlili, R.C. Sinclair, V. Jacauskas, D. Suleimenova, B. Bosak, M. Kulczewski, T. Piontek, P. Kopta, I. Chirca, H. Arabnejad, O.O. Luk, O. Hoenen, J. Weglarz, D. Crommelin, D. Groen, Building confidence in simulation: Application of easyvvuq, Submitted to Journal of Advanced Theory and Simulations on 12/12/2019

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Archivio della ricerca - Università degli studi di Napoli Federico II

UPCommons. Portal del coneixement obert de la UPC

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

RiuNet

Challenges in deeply heterogeneous high performance systems

Author: Agosta Giovanni
Atienza David
Canal Corretger Ramon
Cilardo Alessandro
Flich Cardo José
Fornaciari William
Hernández Luz Carles
Kulczewski Michal
Massari Giuseppe
Tornero Gavilá Rafael
Zapater Sancho Marina
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.RECIPE (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems) is a recently started project funded within the H2020 FETHPC programme, which is expressly targeted at exploring new High-Performance Computing (HPC) technologies. RECIPE aims at introducing a hierarchical runtime resource management infrastructure to optimize energy efficiency and minimize the occurrence of thermal hotspots, while enforcing the time constraints imposed by the applications and ensuring reliability for both time-critical and throughput-oriented computation that run on deeply heterogeneous accelerator-based systems. This paper presents a detailed overview of RECIPE, identifying the fundamental challenges as well as the key innovations addressed by the project, which span run-time management, heterogeneous computing architectures, HPC memory/interconnection infrastructures, thermal modelling, reliability, programming models, and timing analysis. For each of these areas, the paper describes the relevant state of the art as well as the specific actions that the project will take to effectively address the identified technological challenges.Peer Reviewe

RECERCAT

Towards EXtreme scale technologies and accelerators for euROhpc hw/Sw supercomputing applications for exascale: The TEXTAROSSA approach

Author: Agosta Giovanni
Aldinucci Marco
Alvarez Carlos
Ammendola Roberto
Arfat Yasir
Beaumont Olivier
Bernaschi Massimo
Biagioni Andrea
Boccali Tommaso
Bramas Berenger
Brandolese Carlo
Cantalupo Barbara
Carrozzo Mauro
Cattaneo Daniele
Celestini Alessandro
Celino Massimo
Colonnelli Iacopo
Cretaro Paolo
Danelutto Marco
D’Ambra Pasqua
Esposito Roberto
Eyraud-Dubois Lionel
Filgueras Antonio
Fornaciari William
Frezza Ottorino
Galimberti Andrea
Giacomini Francesco
Goglin Brice
Gregori Daniele
Guermouche Abdou
Iannone Francesco
Kulczewski Michal
Lo Cicero Francesca
Lonardo Alessandro
Martinelli Alberto R.
Martinelli Michele
Martorell Xavier
Massari Giuseppe
Mittone Gianluca
Montangero Simone
Namyst Raymond
Oleksiak Ariel
Palazzari Paolo
Paolucci Pier Stanislao
Reghenzani Federico
Rossi Cristian
Saponara Sergio
Simula Francesco
Terraneo Federico
Thibault Samuel
Torquati Massimo
Turisini Matteo
Vicini Piero
Vidal Miquel
Zoni Davide
Zummo Giuseppe
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

In the near future, Exascale systems will need to bridge three technology gaps to achieve high performance while remaining under tight power constraints: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetic; methods and tools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA addresses these gaps through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models, and tools derived from European research

Archivio istituzionale della ricerca - Politecnico di Milano

INRIA a CCSD electronic archive server